Introduction

  • This notebook is built to analyze in detail on what all could be done using the data provided about all players.
  • This notebook would have an in-depth analysis of the some of the main data features
  • Interesting Ideas:
    • come up with dream team (people who are best in all positions)
    • Top 3 clubs that are good in 'Attacking' and it's top 3 contributers
    • Top 3 clubs that are good in 'Defense' and it's top 3 contributers
    • Best club overall and it's top 3 contributers
    • Top 3 nations that has best footballers
    • Next best player according to wages
  • How wage and players are related
  • Players who aren't performing after age
  • Come up with Id card that would show person's Skills, team, name, weiht, height, income
  • group subset skills under common skills
  • Which position would a person wanna get trained if he wants to make it quickly?

Information about the data given

  • Columns
  • row number
  • IDunique id for every player
  • Namename
  • Ageage
  • Photourl to the player's photo
  • Nationalitynationality
  • Flagurl to players's country flag
  • Overalloverall rating
  • Potentialpotential rating
  • Clubcurrent club
  • Club Logourl to club logo
  • Valuecurrent market value
  • Wagecurrent wage
  • Special
  • Preferred Footleft/right
  • International Reputationrating on scale of 5
  • Weak Footrating on scale of 5
  • Skill Movesrating on scale of 5
  • Work Rateattack work rate/defence work rate
  • Body Typebody type of player
  • Real Face
  • Positionposition on the pitch
  • Jersey Numberjersey number
  • Joinedjoined date
  • Loaned Fromclub name if applicable
  • Contract Valid Untilcontract end date
  • Heightheight of the player
  • Weightweight of the player
  • LS rating on scale of 100
  • ST rating on scale of 100
  • RS rating on scale of 100
  • LW rating on scale of 100
  • LF rating on scale of 100
  • CF rating on scale of 100
  • RF rating on scale of 100
  • RW rating on scale of 100
  • LAM rating on scale of 100
  • CAM rating on scale of 100
  • RAM rating on scale of 100
  • LM rating on scale of 100
  • LCM rating on scale of 100
  • CM rating on scale of 100
  • RCM rating on scale of 100
  • RM rating on scale of 100
  • LWB rating on scale of 100
  • LDM rating on scale of 100
  • CDM rating on scale of 100
  • RDM rating on scale of 100
  • RWB rating on scale of 100
  • LB rating on scale of 100
  • LCB rating on scale of 100
  • CB rating on scale of 100
  • RCB rating on scale of 100
  • RB rating on scale of 100
  • Crossing rating on scale of 100
  • Finishing rating on scale of 100
  • HeadingAccuracy rating on scale of 100
  • ShortPassing rating on scale of 100
  • Volleys rating on scale of 100
  • Dribbling rating on scale of 100
  • Curverating on scale of 100
  • FKAccuracy rating on scale of 100
  • LongPassing rating on scale of 100
  • BallControl rating on scale of 100
  • Acceleration rating on scale of 100
  • SprintSpeed rating on scale of 100
  • Agility rating on scale of 100
  • Reactions rating on scale of 100
  • Balance rating on scale of 100
  • ShotPower rating on scale of 100
  • Jumping rating on scale of 100
  • Stamina rating on scale of 100
  • Strength rating on scale of 100
  • LongShots rating on scale of 100
  • Aggression rating on scale of 100
  • Interceptions rating on scale of 100
  • Positioning rating on scale of 100
  • Vision rating on scale of 100
  • Penalties rating on scale of 100
  • Composure rating on scale of 100
  • Marking rating on scale of 100
  • StandingTackle rating on scale of 100
  • SlidingTackle rating on scale of 100
  • GK Diving rating on scale of 100
  • GK Handling rating on scale of 100
  • GK Kicking rating on scale of 100
  • GK Positioning rating on scale of 100
  • GK Reflexes rating on scale of 100
  • Release Clauserelease clause value

Importing the packages needed

In [1]:
%matplotlib inline
%load_ext autoreload
%autoreload 2
%config InlineBackend.figure_format = 'retina'

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns 
import tensorflow as tf
from math import pi

Load and prepare the data

In [2]:
data_path = "data.csv"
df = pd.read_csv(data_path)

Initial Data Inspection

In [3]:
df.columns
Out[3]:
Index(['Unnamed: 0', 'ID', 'Name', 'Age', 'Photo', 'Nationality', 'Flag',
       'Overall', 'Potential', 'Club', 'Club Logo', 'Value', 'Wage', 'Special',
       'Preferred Foot', 'International Reputation', 'Weak Foot',
       'Skill Moves', 'Work Rate', 'Body Type', 'Real Face', 'Position',
       'Jersey Number', 'Joined', 'Loaned From', 'Contract Valid Until',
       'Height', 'Weight', 'LS', 'ST', 'RS', 'LW', 'LF', 'CF', 'RF', 'RW',
       'LAM', 'CAM', 'RAM', 'LM', 'LCM', 'CM', 'RCM', 'RM', 'LWB', 'LDM',
       'CDM', 'RDM', 'RWB', 'LB', 'LCB', 'CB', 'RCB', 'RB', 'Crossing',
       'Finishing', 'HeadingAccuracy', 'ShortPassing', 'Volleys', 'Dribbling',
       'Curve', 'FKAccuracy', 'LongPassing', 'BallControl', 'Acceleration',
       'SprintSpeed', 'Agility', 'Reactions', 'Balance', 'ShotPower',
       'Jumping', 'Stamina', 'Strength', 'LongShots', 'Aggression',
       'Interceptions', 'Positioning', 'Vision', 'Penalties', 'Composure',
       'Marking', 'StandingTackle', 'SlidingTackle', 'GKDiving', 'GKHandling',
       'GKKicking', 'GKPositioning', 'GKReflexes', 'Release Clause'],
      dtype='object')
In [4]:
df.head()
Out[4]:
Unnamed: 0 ID Name Age Photo Nationality Flag Overall Potential Club ... Composure Marking StandingTackle SlidingTackle GKDiving GKHandling GKKicking GKPositioning GKReflexes Release Clause
0 0 158023 L. Messi 31 https://cdn.sofifa.org/players/4/19/158023.png Argentina https://cdn.sofifa.org/flags/52.png 94 94 FC Barcelona ... 96.0 33.0 28.0 26.0 6.0 11.0 15.0 14.0 8.0 €226.5M
1 1 20801 Cristiano Ronaldo 33 https://cdn.sofifa.org/players/4/19/20801.png Portugal https://cdn.sofifa.org/flags/38.png 94 94 Juventus ... 95.0 28.0 31.0 23.0 7.0 11.0 15.0 14.0 11.0 €127.1M
2 2 190871 Neymar Jr 26 https://cdn.sofifa.org/players/4/19/190871.png Brazil https://cdn.sofifa.org/flags/54.png 92 93 Paris Saint-Germain ... 94.0 27.0 24.0 33.0 9.0 9.0 15.0 15.0 11.0 €228.1M
3 3 193080 De Gea 27 https://cdn.sofifa.org/players/4/19/193080.png Spain https://cdn.sofifa.org/flags/45.png 91 93 Manchester United ... 68.0 15.0 21.0 13.0 90.0 85.0 87.0 88.0 94.0 €138.6M
4 4 192985 K. De Bruyne 27 https://cdn.sofifa.org/players/4/19/192985.png Belgium https://cdn.sofifa.org/flags/7.png 91 92 Manchester City ... 88.0 68.0 58.0 51.0 15.0 13.0 5.0 10.0 13.0 €196.4M

5 rows × 89 columns

In [5]:
df.describe()
Out[5]:
Unnamed: 0 ID Age Overall Potential Special International Reputation Weak Foot Skill Moves Jersey Number ... Penalties Composure Marking StandingTackle SlidingTackle GKDiving GKHandling GKKicking GKPositioning GKReflexes
count 18207.000000 18207.000000 18207.000000 18207.000000 18207.000000 18207.000000 18159.000000 18159.000000 18159.000000 18147.000000 ... 18159.000000 18159.000000 18159.000000 18159.000000 18159.000000 18159.000000 18159.000000 18159.000000 18159.000000 18159.000000
mean 9103.000000 214298.338606 25.122206 66.238699 71.307299 1597.809908 1.113222 2.947299 2.361308 19.546096 ... 48.548598 58.648274 47.281623 47.697836 45.661435 16.616223 16.391596 16.232061 16.388898 16.710887
std 5256.052511 29965.244204 4.669943 6.908930 6.136496 272.586016 0.394031 0.660456 0.756164 15.947765 ... 15.704053 11.436133 19.904397 21.664004 21.289135 17.695349 16.906900 16.502864 17.034669 17.955119
min 0.000000 16.000000 16.000000 46.000000 48.000000 731.000000 1.000000 1.000000 1.000000 1.000000 ... 5.000000 3.000000 3.000000 2.000000 3.000000 1.000000 1.000000 1.000000 1.000000 1.000000
25% 4551.500000 200315.500000 21.000000 62.000000 67.000000 1457.000000 1.000000 3.000000 2.000000 8.000000 ... 39.000000 51.000000 30.000000 27.000000 24.000000 8.000000 8.000000 8.000000 8.000000 8.000000
50% 9103.000000 221759.000000 25.000000 66.000000 71.000000 1635.000000 1.000000 3.000000 2.000000 17.000000 ... 49.000000 60.000000 53.000000 55.000000 52.000000 11.000000 11.000000 11.000000 11.000000 11.000000
75% 13654.500000 236529.500000 28.000000 71.000000 75.000000 1787.000000 1.000000 3.000000 3.000000 26.000000 ... 60.000000 67.000000 64.000000 66.000000 64.000000 14.000000 14.000000 14.000000 14.000000 14.000000
max 18206.000000 246620.000000 45.000000 94.000000 95.000000 2346.000000 5.000000 5.000000 5.000000 99.000000 ... 92.000000 96.000000 94.000000 93.000000 91.000000 90.000000 92.000000 91.000000 90.000000 94.000000

8 rows × 44 columns

In [6]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18207 entries, 0 to 18206
Data columns (total 89 columns):
Unnamed: 0                  18207 non-null int64
ID                          18207 non-null int64
Name                        18207 non-null object
Age                         18207 non-null int64
Photo                       18207 non-null object
Nationality                 18207 non-null object
Flag                        18207 non-null object
Overall                     18207 non-null int64
Potential                   18207 non-null int64
Club                        17966 non-null object
Club Logo                   18207 non-null object
Value                       18207 non-null object
Wage                        18207 non-null object
Special                     18207 non-null int64
Preferred Foot              18159 non-null object
International Reputation    18159 non-null float64
Weak Foot                   18159 non-null float64
Skill Moves                 18159 non-null float64
Work Rate                   18159 non-null object
Body Type                   18159 non-null object
Real Face                   18159 non-null object
Position                    18147 non-null object
Jersey Number               18147 non-null float64
Joined                      16654 non-null object
Loaned From                 1264 non-null object
Contract Valid Until        17918 non-null object
Height                      18159 non-null object
Weight                      18159 non-null object
LS                          16122 non-null object
ST                          16122 non-null object
RS                          16122 non-null object
LW                          16122 non-null object
LF                          16122 non-null object
CF                          16122 non-null object
RF                          16122 non-null object
RW                          16122 non-null object
LAM                         16122 non-null object
CAM                         16122 non-null object
RAM                         16122 non-null object
LM                          16122 non-null object
LCM                         16122 non-null object
CM                          16122 non-null object
RCM                         16122 non-null object
RM                          16122 non-null object
LWB                         16122 non-null object
LDM                         16122 non-null object
CDM                         16122 non-null object
RDM                         16122 non-null object
RWB                         16122 non-null object
LB                          16122 non-null object
LCB                         16122 non-null object
CB                          16122 non-null object
RCB                         16122 non-null object
RB                          16122 non-null object
Crossing                    18159 non-null float64
Finishing                   18159 non-null float64
HeadingAccuracy             18159 non-null float64
ShortPassing                18159 non-null float64
Volleys                     18159 non-null float64
Dribbling                   18159 non-null float64
Curve                       18159 non-null float64
FKAccuracy                  18159 non-null float64
LongPassing                 18159 non-null float64
BallControl                 18159 non-null float64
Acceleration                18159 non-null float64
SprintSpeed                 18159 non-null float64
Agility                     18159 non-null float64
Reactions                   18159 non-null float64
Balance                     18159 non-null float64
ShotPower                   18159 non-null float64
Jumping                     18159 non-null float64
Stamina                     18159 non-null float64
Strength                    18159 non-null float64
LongShots                   18159 non-null float64
Aggression                  18159 non-null float64
Interceptions               18159 non-null float64
Positioning                 18159 non-null float64
Vision                      18159 non-null float64
Penalties                   18159 non-null float64
Composure                   18159 non-null float64
Marking                     18159 non-null float64
StandingTackle              18159 non-null float64
SlidingTackle               18159 non-null float64
GKDiving                    18159 non-null float64
GKHandling                  18159 non-null float64
GKKicking                   18159 non-null float64
GKPositioning               18159 non-null float64
GKReflexes                  18159 non-null float64
Release Clause              16643 non-null object
dtypes: float64(38), int64(6), object(45)
memory usage: 12.4+ MB
In [7]:
df.isna().sum()
Out[7]:
Unnamed: 0                      0
ID                              0
Name                            0
Age                             0
Photo                           0
Nationality                     0
Flag                            0
Overall                         0
Potential                       0
Club                          241
Club Logo                       0
Value                           0
Wage                            0
Special                         0
Preferred Foot                 48
International Reputation       48
Weak Foot                      48
Skill Moves                    48
Work Rate                      48
Body Type                      48
Real Face                      48
Position                       60
Jersey Number                  60
Joined                       1553
Loaned From                 16943
Contract Valid Until          289
Height                         48
Weight                         48
LS                           2085
ST                           2085
                            ...  
Dribbling                      48
Curve                          48
FKAccuracy                     48
LongPassing                    48
BallControl                    48
Acceleration                   48
SprintSpeed                    48
Agility                        48
Reactions                      48
Balance                        48
ShotPower                      48
Jumping                        48
Stamina                        48
Strength                       48
LongShots                      48
Aggression                     48
Interceptions                  48
Positioning                    48
Vision                         48
Penalties                      48
Composure                      48
Marking                        48
StandingTackle                 48
SlidingTackle                  48
GKDiving                       48
GKHandling                     48
GKKicking                      48
GKPositioning                  48
GKReflexes                     48
Release Clause               1564
Length: 89, dtype: int64
In [8]:
df.dropna().shape[0]
Out[8]:
0

Data Cleaning

Fill the Nan Values

  • Fill all the attributes with datatype "float64" with it's mean
    • These attribures are mostly player skills
  • Give a general number for features like weight, height, contract date, Club and wage
  • Fill the rest with 0
In [9]:
for feature in df.columns:
    if df[feature].dtype == 'float64':
        df[feature].fillna(df[feature].mean(), inplace=True)
    
df['Weight'].fillna('200lbs', inplace = True)
df['Contract Valid Until'].fillna(2019, inplace = True)
df['Height'].fillna("5'11", inplace = True)
df['Loaned From'].fillna('None', inplace = True)
df['Joined'].fillna('Jul 1, 2018', inplace = True)
df['Jersey Number'].fillna(8, inplace = True)
df['Body Type'].fillna('Normal', inplace = True)
df['Position'].fillna('ST', inplace = True)
df['Club'].fillna('No Club', inplace = True)
df['Work Rate'].fillna('Medium/ Medium', inplace = True)
df['Skill Moves'].fillna(df['Skill Moves'].median(), inplace = True)
df['Weak Foot'].fillna(3, inplace = True)
df['Preferred Foot'].fillna('Right', inplace = True)
df['International Reputation'].fillna(1, inplace = True)
df['Wage'].fillna('€200K', inplace = True)

Fill the rest of the NaN data with 0

In [10]:
df.fillna(0, inplace = True)

Change value and wage to a real number

In [11]:
def value_to_int(df_value):
    try:
        value = float(df_value[1:-1])
        suffix = df_value[-1:]
        if suffix == 'M':
            value = value * 1000000
        elif suffix == 'K':
            value = value * 1000
    except ValueError:
        value = 0
    return value

df['Value'] = df['Value'].apply(value_to_int)
df['Wage'] = df['Wage'].apply(value_to_int)

Take out Lbs from Weight

In [12]:
def weight_correction(df):
    try:
        value = float(df[:-3])
    except:
        value = 0
    return value
df['Weight'] = df.Weight.apply(weight_correction)

Grouping similar skills together

In [13]:
def defending(data):
    return int(round((data[['Marking', 'StandingTackle', 
                               'SlidingTackle']].mean()).mean()))

def general(data):
    return int(round((data[['HeadingAccuracy', 'Dribbling', 'Curve', 
                               'BallControl']].mean()).mean()))

def mental(data):
    return int(round((data[['Aggression', 'Interceptions', 'Positioning', 
                               'Vision','Composure']].mean()).mean()))

def passing(data):
    return int(round((data[['Crossing', 'ShortPassing', 
                               'LongPassing']].mean()).mean()))

def mobility(data):
    return int(round((data[['Acceleration', 'SprintSpeed', 
                               'Agility','Reactions']].mean()).mean()))
def power(data):
    return int(round((data[['Balance', 'Jumping', 'Stamina', 
                               'Strength']].mean()).mean()))

def rating(data):
    return int(round((data[['Potential', 'Overall']].mean()).mean()))

def shooting(data):
    return int(round((data[['Finishing', 'Volleys', 'FKAccuracy', 
                               'ShotPower','LongShots', 'Penalties']].mean()).mean()))
In [14]:
# renaming a column
df.rename(columns={'Club Logo':'Club_Logo'}, inplace=True)

# adding these categories to the data

df['Defending'] = df.apply(defending, axis = 1)
df['General'] = df.apply(general, axis = 1)
df['Mental'] = df.apply(mental, axis = 1)
df['Passing'] = df.apply(passing, axis = 1)
df['Mobility'] = df.apply(mobility, axis = 1)
df['Power'] = df.apply(power, axis = 1)
df['Rating'] = df.apply(rating, axis = 1)
df['Shooting'] = df.apply(shooting, axis = 1)
In [15]:
players = df[['Name','Defending','General','Mental','Passing',
                'Mobility','Power','Rating','Shooting','Flag','Age',
                'Nationality', 'Photo', 'Club_Logo', 'Club']]

Skills/Position Analysis

Number of footballers available in each position

In [16]:
ax = sns.countplot(x='Position', data=df, order = df['Position'].value_counts().index)
plt.figure(figsize = (20, 10))
ax.set_title(label = 'Number of footballers available in each position', fontsize = 20)
plt.show()
<Figure size 1440x720 with 0 Axes>

Analyzing top 5 features on all skills

In [17]:
player_features = (
    'Acceleration', 'Aggression', 'Agility', 
    'Balance', 'BallControl', 'Composure', 
    'Crossing', 'Dribbling', 'FKAccuracy', 
    'Finishing', 'GKDiving', 'GKHandling', 
    'GKKicking', 'GKPositioning', 'GKReflexes', 
    'HeadingAccuracy', 'Interceptions', 'Jumping', 
    'LongPassing', 'LongShots', 'Marking', 'Penalties'
)

from math import pi
idx = 1
plt.figure(figsize=(15,45))
for position_name, features in df.groupby(df['Position'])[player_features].mean().iterrows():
    top_features = dict(features.nlargest(5))
    
    # number of variable
    categories=top_features.keys()
    N = len(categories)

    # We are going to plot the first line of the data frame.
    # But we need to repeat the first value to close the circular graph:
    values = list(top_features.values())
    values += values[:1]

    # What will be the angle of each axis in the plot? (we divide the plot / number of variable)
    angles = [n / float(N) * 2 * pi for n in range(N)]
    angles += angles[:1]

    # Initialise the spider plot
    ax = plt.subplot(10, 3, idx, polar=True)

    # Draw one axe per variable + add labels labels yet
    plt.xticks(angles[:-1], categories, color='grey', size=8)
    
    # Draw ylabels
    ax.set_rlabel_position(0)
    plt.yticks([25,50,75], ["25","50","75"], color="grey", size=7)
    plt.ylim(0,100)
    
    plt.subplots_adjust(hspace = 0.5)
    
    # Plot data
    ax.plot(angles, values, linewidth=1, linestyle='solid')

    # Fill area
    ax.fill(angles, values, 'b', alpha=0.1)
    
    plt.title(position_name, size=11, y=1.1)
    
    idx += 1

More stats on skills and special moves

In [18]:
sns.set(style = 'dark', palette = 'colorblind', color_codes = True)
x = df.Special
plt.figure(figsize = (12, 8))
ax = sns.distplot(x, bins = 50, kde = False, color = 'm')
ax.set_xlabel(xlabel = 'Special score range', fontsize = 16)
ax.set_ylabel(ylabel = 'Count of the Players',fontsize = 16)
ax.set_title(label = 'Histogram for the Speciality Scores of the Players', fontsize = 20)
plt.show()
In [19]:
df.plot(kind='scatter', x='Special', y='Skill Moves')
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
Out[19]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a2e88e198>
In [20]:
sns.countplot(x = 'Skill Moves', data=df)
Out[20]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a2dfd6b00>
In [21]:
df['Skill Moves'].value_counts()
Out[21]:
2.000000    8565
3.000000    6600
1.000000    2026
4.000000     917
5.000000      51
2.361308      48
Name: Skill Moves, dtype: int64

Top 5 nations that have the best skilled footballers

In [22]:
plt.rcParams['figure.figsize'] = (20, 10)
skill_df = df[df['Skill Moves'] == 5][['Name','Nationality']]
sns.countplot(x='Nationality', data=skill_df, order=skill_df.Nationality.value_counts().iloc[:5].index)
Out[22]:
<matplotlib.axes._subplots.AxesSubplot at 0x115160160>

Awards Section

  • This section would have the top 5/ top 3 contibutors for many sections

Top footballer producing nations

In [23]:
import squarify
df.Nationality.value_counts().nlargest(5).plot(kind='bar')
#sns.countplot(x='Nationality', data =df, order=df.Nationality.value_counts().iloc[:5].index)
Out[23]:
<matplotlib.axes._subplots.AxesSubplot at 0x11518fba8>

Player weight distribution in top 5 footballer producing countries

In [24]:
countries = ['England', 'Germany', 'Spain', 'Argentina', 'France']
In [25]:
data_countries = df[df['Nationality'].isin(countries)]
In [26]:
plt.rcParams['figure.figsize'] = (12, 7)

ax = sns.violinplot(x = data_countries['Nationality'], y = data_countries['Weight'], palette = 'colorblind')
ax.set_xlabel(xlabel = 'Countries', fontsize = 9)
ax.set_ylabel(ylabel = 'Weight in lbs', fontsize = 9)
ax.set_title(label = 'Distribution of Weight of players from different countries', fontsize = 20)
/Users/sganesh/anaconda3/envs/tensorflow/lib/python3.5/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
Out[26]:
Text(0.5, 1.0, 'Distribution of Weight of players from different countries')

Club-level Analysis

In [91]:
import matplotlib.image as mpimg
import requests
def print_club_flag(clubs):
    fig = plt.figure(figsize=(10,10))
    for index, club in enumerate(clubs):
        logo = df[df['Club'] == club]['Club_Logo'].iloc[0]
        logo_image = "img_club_logo.jpg"
        logo_flag = requests.get(logo).content
        with open(logo_image, 'wb') as handler:
            handler.write(logo_flag)
        img=mpimg.imread(logo_image)
        ax = fig.add_subplot(1, 6, index+1, xticks=[], yticks=[])
        fig.tight_layout()
        ax.imshow(img, interpolation="lanczos")
        ax.set_title("%d. %s" %(index+1, club))
    
def print_national_flag(nations):
    fig = plt.figure(figsize=(10, 10))
    for index, nation in enumerate(nations):
        logo = df[df['Nationality'] == nation]['Flag'].iloc[0]
        logo_image = "img_nation_logo.jpg"
        logo_flag = requests.get(logo).content
        with open(logo_image, 'wb') as handler:
            handler.write(logo_flag)
        img=mpimg.imread(logo_image)
        ax = fig.add_subplot(1, 6, index+1, xticks=[], yticks=[])
        fig.tight_layout()
        ax.imshow(img, interpolation="lanczos")
        ax.set_title("%d. %s" %(index+1, nation))

Best football clubs

  • Here are the top 5 football clubs w.r.t their overall rating
In [92]:
d = {'Overall': 'Average_Rating'}
best_overall_club_df = df.groupby('Club').agg({'Overall':'mean'}).rename(columns=d)
clubs = best_overall_club_df.Average_Rating.nlargest(5).index
clubs_list = []

print_club_flag(clubs)

Clubs that have the best Attack

  • Here are the top 5 clubs that specialize in attack
In [93]:
attck_list = ['Shooting', 'Power', 'Passing']

best_attack_df = players.groupby('Club')[attck_list].sum().sum(axis=1)
clubs = best_attack_df.nlargest(5).index

print_club_flag(clubs)

Clubs that have the best Defense

In [94]:
best_defense_df = players.groupby('Club')['Defending'].sum()
clubs = best_defense_df.nlargest(5).index
print_club_flag(clubs)

    

Nation-level Analysis

Best footballing nations

In [97]:
d = {'Overall': 'Average_Rating'}
best_overall_country_df = df.groupby('Nationality').agg({'Overall':'mean'}).rename(columns=d)
nations = best_overall_country_df.Average_Rating.nlargest(5).index
print_national_flag(nations)
/Users/sganesh/anaconda3/envs/tensorflow/lib/python3.5/site-packages/matplotlib/tight_layout.py:198: UserWarning: tight_layout cannot make axes width small enough to accommodate all axes decorations
  warnings.warn('tight_layout cannot make axes width small enough '
In [32]:
best_3_uae = df[df['Nationality'] == 'United Arab Emirates']['Overall'].nlargest(3)
print(best_3_uae)
uae_df = df[df['Nationality'] == 'United Arab Emirates']
uae_df[uae_df['Overall'].isin(best_3_uae)]['Name']
1170    77
Name: Overall, dtype: int64
Out[32]:
1170    O. Abdulrahman
Name: Name, dtype: object

Nations that has the best Attack

In [96]:
best_attack_nation_df = players.groupby('Nationality')[attck_list].sum().sum(axis=1)
nations = best_attack_nation_df.nlargest(5).index
print_national_flag(nations)

Nations that has the best Defense

In [90]:
best_defense_nation_df = players.groupby('Nationality')['Defending'].sum()
nations = best_defense_nation_df.nlargest(5).index
print_national_flag(index, nations)
In [35]:
import requests
import random
from math import pi

import matplotlib.image as mpimg
from matplotlib.offsetbox import (OffsetImage,AnnotationBbox)

def details(row, title, image, age, nationality, photo, logo, club):
    
    flag_image = "img_flag.jpg"
    player_image = "img_player.jpg"
    logo_image = "img_club_logo.jpg"
        
    img_flag = requests.get(image).content
    with open(flag_image, 'wb') as handler:
        handler.write(img_flag)
    
    player_img = requests.get(photo).content
    with open(player_image, 'wb') as handler:
        handler.write(player_img)
     
    logo_img = requests.get(logo).content
    with open(logo_image, 'wb') as handler:
        handler.write(logo_img)
        
    r = lambda: random.randint(0,255)
    colorRandom = '#%02X%02X%02X' % (r(),r(),r())
    
    if colorRandom == '#ffffff':colorRandom = '#a5d6a7'
    
    basic_color = '#37474f'
    color_annotate = '#01579b'
    
    img = mpimg.imread(flag_image)
    #flg_img = mpimg.imread(logo_image)
    
    plt.figure(figsize=(15,8))
    categories=list(players)[1:]
    coulumnDontUseGraph = ['Flag', 'Age', 'Nationality', 'Photo', 'Logo', 'Club']
    N = len(categories) - len(coulumnDontUseGraph)
    
    angles = [n / float(N) * 2 * pi for n in range(N)]
    angles += angles[:1]
    
    ax = plt.subplot(111, projection='polar')
    ax.set_theta_offset(pi / 2)
    ax.set_theta_direction(-1)
    plt.xticks(angles[:-1], categories, color= 'black', size=17)
    ax.set_rlabel_position(0)
    plt.yticks([25,50,75,100], ["25","50","75","100"], color= basic_color, size= 10)
    plt.ylim(0,100)
    
    values = players.loc[row].drop('Name').values.flatten().tolist() 
    valuesDontUseGraph = [image, age, nationality, photo, logo, club]
    values = [e for e in values if e not in (valuesDontUseGraph)]
    values += values[:1]
    
    ax.plot(angles, values, color= basic_color, linewidth=1, linestyle='solid')
    ax.fill(angles, values, color= colorRandom, alpha=0.5)
    axes_coords = [0, 0, 1, 1]
    ax_image = plt.gcf().add_axes(axes_coords,zorder= -1)
    ax_image.imshow(img,alpha=0.5)
    ax_image.axis('off')
    
    ax.annotate('Nationality: ' + nationality.upper(), xy=(10,10), xytext=(103, 138),
                fontsize= 12,
                color = 'white',
                bbox={'facecolor': color_annotate, 'pad': 7})
                      
    ax.annotate('Age: ' + str(age), xy=(10,10), xytext=(43, 180),
                fontsize= 15,
                color = 'white',
                bbox={'facecolor': color_annotate, 'pad': 7})
    
    ax.annotate('Team: ' + club.upper(), xy=(10,10), xytext=(92, 168),
                fontsize= 12,
                color = 'white',
                bbox={'facecolor': color_annotate, 'pad': 7})

    arr_img_player = plt.imread(player_image, format='jpg')

    imagebox_player = OffsetImage(arr_img_player)
    imagebox_player.image.axes = ax
    abPlayer = AnnotationBbox(imagebox_player, (0.5, 0.7),
                        xybox=(313, 223),
                        xycoords='data',
                        boxcoords="offset points"
                        )
    arr_img_logo = plt.imread(logo_image, format='jpg')

    imagebox_logo = OffsetImage(arr_img_logo)
    imagebox_logo.image.axes = ax
    abLogo = AnnotationBbox(imagebox_logo, (0.5, 0.7),
                        xybox=(-320, -226),
                        xycoords='data',
                        boxcoords="offset points"
                        )

    ax.add_artist(abPlayer)
    ax.add_artist(abLogo)

    plt.title(title, size=50, color= basic_color)
In [36]:
# defining a polar graph

def get_id_card(id = 0):
    if 0 <= id < len(df.ID):
        details(row = players.index[id], 
                title = players['Name'][id], 
                age = players['Age'][id], 
                photo = players['Photo'][id],
                nationality = players['Nationality'][id],
                image = players['Flag'][id], 
                logo = players['Club_Logo'][id], 
                club = players['Club'][id])
    else:
        print('The base has 17917 players. You can put positive numbers from 0 to 17917')

Top 5 footballers

  • This gives a pictorial representation of the top 5 footballers
  • Thanks Roshan sharma for the ID card code. Really well done!!!
In [37]:
best_footballers = df['Overall'].nlargest(5)
for index in best_footballers.index:
    get_id_card(index)

Dream Team

  • Ever dreamt of a team which would have all your favourite players?
  • This team below has the best players in all positions :)
In [38]:
df.loc[df.groupby(df['Position'])['Potential'].idxmax()][['Name', 'Position', 'Overall', 'Age', 'Nationality', 'Club']]
Out[38]:
Name Position Overall Age Nationality Club
31 C. Eriksen CAM 88 26 Denmark Tottenham Hotspur
42 S. Umtiti CB 87 24 France FC Barcelona
27 Casemiro CDM 88 26 Brazil Real Madrid
350 A. Milik CF 81 24 Poland Napoli
78 S. Milinković-Savić CM 85 23 Serbia Lazio
3 De Gea GK 91 27 Spain Manchester United
28 J. Rodríguez LAM 88 26 Colombia FC Bayern München
35 Marcelo LB 88 30 Brazil Real Madrid
77 M. Škriniar LCB 85 23 Slovakia Inter
11 T. Kroos LCM 90 28 Germany Real Madrid
14 N. Kanté LDM 89 27 France Chelsea
15 P. Dybala LF 89 24 Argentina Juventus
415 H. Aouar LM 80 20 France Olympique Lyonnais
21 E. Cavani LS 89 31 Uruguay Paris Saint-Germain
2 Neymar Jr LW 92 26 Brazil Paris Saint-Germain
601 Jonny LWB 79 24 Spain Wolverhampton Wanderers
171 H. Ziyech RAM 83 25 Morocco Ajax
247 João Cancelo RB 82 24 Portugal Juventus
8 Sergio Ramos RCB 91 32 Spain Real Madrid
4 K. De Bruyne RCM 91 27 Belgium Manchester City
45 P. Pogba RDM 87 25 France Manchester United
0 L. Messi RF 94 31 Argentina FC Barcelona
25 K. Mbappé RM 88 19 France Paris Saint-Germain
7 L. Suárez RS 91 31 Uruguay FC Barcelona
79 Marco Asensio RW 85 22 Spain Real Madrid
766 Pablo Maffeo RWB 78 20 Spain VfB Stuttgart
1 Cristiano Ronaldo ST 94 33 Portugal Juventus

Wage Analysis

In [39]:
sns.set(style = 'dark', palette = 'colorblind', color_codes = True)
x = df.Wage
plt.figure(figsize = (12, 8))
ax = sns.distplot(x, bins = 50, kde = False, color = 'm')
ax.set_xlabel(xlabel = 'Player Wage', fontsize = 16)
ax.set_ylabel(ylabel = 'Player Count',fontsize = 16)
ax.set_title(label = 'Histogram that shows the wage of the Players', fontsize = 20)
plt.show()

Inference

  • Majority of the footballers are in the low wage section (as we don't have enough data about them)

Next Steps

  • This section would involve some more analysis predictions like:
    • Who would be the next big star?
    • What all would contribute to get a better salary?
    • Please comment on which predctions/analysis would you need
In [40]:
positions = ['CAM', 'CB', 'CDM', 'CF', 'CM', 'LAM',
       'LB', 'LCB', 'LCM', 'LDM', 'LF', 'LM', 'LS', 'LW', 'LWB', 'RAM', 'RB', 'RCB', 'RCM', 'RDM', 'RF',
       'RM', 'RS', 'RW', 'RWB']
In [41]:
for i in positions:
    print('\n\n','Top 10', i, 'in FIFA 19', '\n')
    temp_df = df[df.Position == i]
    print(temp_df.sort_values(i, ascending=False).head(10).reset_index()[['Name', i]])

    
#print(df.sort_values(temp_df, ascending=False).head(10).reset_index()[['Name', 'Nationality', 'Club', 'Overall']])

 Top 10 CAM in FIFA 19 

              Name   CAM
0     A. Griezmann  86+3
1       C. Eriksen  86+3
2  Roberto Firmino  84+3
3          M. Özil  84+3
4         N. Fekir  83+3
5         D. Payet  82+3
6        T. Müller  82+3
7       J. Pastore  82+3
8       E. Lavezzi  81+3
9    R. Nainggolan  81+3


 Top 10 CB in FIFA 19 

          Name    CB
0     D. Godín  87+3
1   M. Benatia  84+3
2    S. Umtiti  84+3
3      Miranda  83+3
4   V. Kompany  83+3
5  N. Otamendi  82+3
6   S. de Vrij  82+3
7  A. Barzagli  82+3
8        Naldo  82+3
9      Marcano  81+3


 Top 10 CDM in FIFA 19 

               Name   CDM
0   Sergio Busquets  86+3
1          Casemiro  85+3
2       Fernandinho  83+3
3           Fabinho  83+3
4      K. Strootman  82+3
5  William Carvalho  82+3
6          N. Matić  82+3
7    Danilo Pereira  81+3
8     Javi Martínez  81+3
9         S. Nzonzi  81+2


 Top 10 CF in FIFA 19 

              Name    CF
0     Luis Alberto  82+2
1      S. Giovinco  81+2
2        L. Stindl  80+2
3          Raffael  80+2
4  Ricardo Goulart  78+2
5    G. dos Santos  75+3
6         A. Milik  75+3
7         Y. Ōsako  75+2
8          A. Ruiz  75+2
9       S. Okazaki  75+2


 Top 10 CM in FIFA 19 

                  Name    CM
0               Thiago  84+3
1  S. Milinković-Savić  83+2
2          I. Gündoğan  82+3
3             Jorginho  82+2
4             M. Götze  81+3
5          L. Goretzka  81+3
6           M. Dembélé  81+3
7       G. Bonaventura  81+3
8             N. Keïta  81+2
9           C. Tolisso  81+2


 Top 10 LAM in FIFA 19 

              Name   LAM
0     J. Rodríguez  85+3
1         D. Tadić  79+3
2   Fabrio Farinha  73+2
3  Leo Caldeirinha  73+2
4  Nicolás Formido  73+2
5   Leordinho Paes  71+2
6  Adrián Burnabão  71+2
7      D. Buitrago  70+2
8   Nilson Padilho  69+2
9    Marlion Rolim  69+2


 Top 10 LB in FIFA 19 

            Name    LB
0        Marcelo  84+3
1     Jordi Alba  84+3
2    Alex Sandro  83+3
3    Filipe Luís  82+3
4    Alex Telles  82+3
5       D. Alaba  82+3
6  Marcos Alonso  81+2
7   L. Hernández  81+2
8       B. Mendy  80+2
9   A. Robertson  80+2


 Top 10 LCB in FIFA 19 

            Name   LCB
0   G. Chiellini  86+3
1    V. van Dijk  85+3
2     M. Hummels  85+3
3   K. Koulibaly  84+3
4  J. Vertonghen  84+3
5     K. Manolas  83+3
6    M. Škriniar  83+2
7     A. Laporte  82+2
8   Luiz Gustavo  81+3
9       F. Fazio  81+3


 Top 10 LCM in FIFA 19 

              Name   LCM
0         T. Kroos  86+3
1      David Silva  85+3
2        M. Hamšík  84+3
3      M. Verratti  84+3
4          D. Alli  83+3
5       M. Kovačić  82+3
6  Bruno Fernandes  82+2
7        A. Witsel  81+3
8            Pizzi  81+3
9         K. Kampl  80+2


 Top 10 LDM in FIFA 19 

            Name   LDM
0       N. Kanté  87+3
1    Lucas Leiva  81+3
2       Paulinho  81+3
3     Marquinhos  80+3
4  J. Mascherano  79+3
5    M. Brozović  79+2
6       W. Ndidi  79+2
7     F. de Jong  78+2
8     G. Pizarro  78+2
9      R. Zobnin  78+2


 Top 10 LF in FIFA 19 

             Name    LF
0       E. Hazard  88+3
1       P. Dybala  86+3
2         Iniesta  81+3
3  Jonathan Viera  79+2
4       S. Blanco  74+2
5     J. Campbell  74+2
6       S. Araujo  73+2
7        P. Ebert  69+2
8   A. Trajkovski  69+2
9  Gabriel Xavier  68+2


 Top 10 LM in FIFA 19 

              Name    LM
0    Douglas Costa  84+3
1          M. Reus  84+3
2          S. Mané  83+3
3             Koke  83+3
4       I. Perišić  83+3
5       Y. Brahimi  82+3
6           H. Son  82+3
7         T. Lemar  82+3
8      Y. Carrasco  81+3
9  Felipe Anderson  81+3


 Top 10 LS in FIFA 19 

            Name    LS
0      E. Cavani  85+3
1     G. Higuaín  85+3
2    Diego Costa  82+3
3   M. Balotelli  80+3
4     A. Belotti  80+3
5  M. Arnautović  80+3
6  Gerard Moreno  80+2
7       C. Bacca  79+3
8    J. Martínez  79+2
9      E. Zahavi  78+2


 Top 10 LW in FIFA 19 

          Name    LW
0    Neymar Jr  89+3
1     Coutinho  86+3
2   L. Insigne  86+3
3         Isco  84+3
4      L. Sané  84+2
5   A. Martial  82+3
6   D. Perotti  80+3
7  M. Rashford  80+2
8     Williams  80+2
9    I. Piatti  80+2


 Top 10 LWB in FIFA 19 

           Name   LWB
0     J. Hector  78+3
1     N. Schulz  78+2
2        P. Max  78+2
3         Jonny  77+2
4     J. Mojica  76+2
5    B. Oczipka  76+2
6      K. Gibbs  75+2
7      E. Insúa  75+2
8  Aday Benítez  74+2
9   L. Vangioni  74+2


 Top 10 RAM in FIFA 19 

                Name   RAM
0        J. Cuadrado  81+3
1          H. Ziyech  81+3
2     Allan Bardinho  74+2
3       Jacson Zonta  72+2
4     Sebas Couteira  72+2
5     Kauã Abranches  72+2
6  Gustavo Lobateiro  72+2
7  Clayton Fildeiras  71+2
8   Fernando Canesín  71+2
9       Mauro Riboas  70+2


 Top 10 RB in FIFA 19 

              Name    RB
0      Azpilicueta  84+3
1        K. Walker  82+3
2         Carvajal  82+3
3    Sergi Roberto  81+3
4     João Cancelo  81+2
5  Mário Fernandes  80+2
6      K. Trippier  80+2
7      L. Piszczek  79+3
8        L. Bender  79+3
9      A. Florenzi  79+3


 Top 10 RCB in FIFA 19 

              Name   RCB
0     Sergio Ramos  87+3
1       L. Bonucci  84+3
2  T. Alderweireld  84+3
3     Thiago Silva  84+3
4            Piqué  83+3
5        R. Varane  83+3
6       J. Boateng  83+3
7       J. Giménez  83+2
8      Raúl Albiol  82+3
9             Pepe  82+3


 Top 10 RCM in FIFA 19 

             Name   RCM
0       L. Modrić  88+3
1    K. De Bruyne  87+3
2      I. Rakitić  84+3
3          Parejo  83+2
4            Saúl  82+3
5      S. Khedira  82+3
6      J. Kimmich  81+3
7  Manu Trigueros  81+2
8           Allan  81+2
9    J. Henderson  80+2


 Top 10 RDM in FIFA 19 

            Name   RDM
0    D. De Rossi  81+3
1       I. Gueye  81+2
2   Illarramendi  81+2
3      M. Parolo  80+2
4    A. Doucouré  80+2
5  T. Stepanenko  79+2
6       P. Pogba  78+3
7     I. Marcone  78+2
8     I. Denisov  78+2
9     M. de Roon  77+2


 Top 10 RF in FIFA 19 

                 Name    RF
0            L. Messi  93+2
1          D. Mertens  85+3
2           D. Valeri  78+2
3         L. Podolski  77+3
4            C. Ciano  72+2
5          P. Gerkens  70+2
6         C. Falletti  70+2
7         Zhang Xizhe  69+2
8  D. Moberg Karlsson  68+2
9            Rafa Mir  67+2


 Top 10 RM in FIFA 19 

              Name    RM
0        K. Mbappé  86+3
1         M. Salah  86+3
2       F. Thauvin  83+2
3      A. Di María  82+3
4         Quaresma  82+3
5        A. Robben  82+3
6        Q. Promes  82+2
7   Gelson Martins  81+3
8    José Callejón  81+3
9  F. Bernardeschi  81+2


 Top 10 RS in FIFA 19 

              Name    RS
0        L. Suárez  87+5
1   Z. Ibrahimović  82+4
2           Falcao  81+3
3    W. Ben Yedder  80+2
4          S. Zaza  79+2
5  F. Quagliarella  79+2
6        S. Haller  79+2
7        M. Marega  78+2
8     Sergi Enrich  78+2
9             Eder  77+3


 Top 10 RW in FIFA 19 

              Name    RW
0        R. Mahrez  84+3
1      R. Sterling  84+3
2   Bernardo Silva  84+2
3    Marco Asensio  83+3
4       O. Dembélé  83+3
5       A. Sánchez  82+3
6          Willian  82+3
7             Suso  82+3
8  Ronaldo Cabrais  82+2
9           Malcom  82+2


 Top 10 RWB in FIFA 19 

           Name   RWB
0    S. Coleman  79+2
1  P. Kadeřábek  78+2
2  D. Caligiuri  77+2
3  Pablo Maffeo  77+2
4       K. Lala  75+2
5    R. Aguilar  75+2
6      Barragán  75+2
7     M. Ginter  74+2
8       E. Durm  73+2
9    M. Doherty  73+2
In [ ]: